The Megaprior Heuristic for Discovering Protein Sequence Patterns

نویسندگان

  • Timothy L. Bailey
  • Michael Gribskov
چکیده

Several computer algorithms for discovering patterns in groups of protein sequences are in use that are based on fitting the parameters of a statistical model to a group of related sequences. These include hidden Markov model (HMM) algorithms for multiple sequence alignment, and the MEME and Gibbs sampler algorithms for discovering motifs. These algorithms are sometimes prone to producing models that are incorrect because two or more patients have been combined. The statistical model produced in this situation is a convex combination (weighted average) of two or more different models. This paper presents a solution to the problem of convex combinations in the form of a heuristic based on using extremely low variance Dirichlet mixture priors as part of the statistical model. This heuristic, which we call the megaprior heuristic, increase the strength (i.e., decreases the variance) of the prior in proportion to the size of the sequence dataset. This causes each column in the final model to strongly resemble the mean of a single component of the prior, regardless of the size of the dataset. We describe the cause of the convex combination problem, analyze it mathematically, motivate and describe the implementation of the megaprior heuristic, and show how it can effectively eliminate the problem of convex combinations in protein sequence pattern discovery.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Separating Mixtures Using Megapriorstimothy

Fitting the parameters of a discrete nite mixture distribution to a set of data using the EM algorithm can be extremely diicult when the likelihood surface has many local maxima, the form of the components is unknown, and the number of components is unknown. The exponential explosion in the number of diierent models and diierent starting points for EM which must be tested can be reduced by ndin...

متن کامل

Project scheduling optimization for contractor’s Net present value maximization using meta-heuristic algorithms: A case study

Today's competitive conditions have caused the projects to be carried out in the least possible time with limited resources. Therefore, managing and scheduling a project is a necessity for the project. The timing of a project is to specify a sequence of times for a series of related activities. According to their priority and their latency, so that between the time the project is completed and ...

متن کامل

A heuristic approach for multi-stage sequence-dependent group scheduling problems

We present several heuristic algorithms based on tabu search for solving the multi-stage sequence-dependent group scheduling (SDGS) problem by considering minimization of makespan as the criterion. As the problem is recognized to be strongly NP-hard, several meta (tabu) search-based solution algorithms are developed to efficiently solve industry-size problem instances. Also, two different initi...

متن کامل

iProsite: an improved prosite database achieved by replacing ambiguous positions with more informative representations

PROSITE database contains a set of entries corresponding to protein families, which are used to identify the family of a protein from its sequence. Although patterns and profiles are developed to be very selective, each may have false positive or negative hits. Considering false positives as items that reduce the selectiveness of a pattern, then, the more selective pattern we have, a more accur...

متن کامل

Meta heuristic for Minimizing Makespan in a Flow-line Manufacturing Cell with Sequence Dependent Family Setup Times

This paper presents a new mathematical model for the problem of scheduling part families and jobs within each part family in a flow line manufacturing cell where the setup times for each family are sequence dependent and it is desired to minimize the maximum completion time of the last job on the last machine (makespan) while processing parts (jobs) in each family together. Gaining an optimal s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Proceedings. International Conference on Intelligent Systems for Molecular Biology

دوره 4  شماره 

صفحات  -

تاریخ انتشار 1996